Opportunistic Cooperation in Cognitive Femtocell 

Networks 

Rahul Urgaonkar, Michael J. Neely 



o : 
<; 

o\ ■ 



U 
O 



> 
o 

rn 
O 



X 



Abstract — We investigate opportunistic cooperation between 
unlicensed secondary users and legacy primary users in a 
cognitive radio network. Specifically, we consider a model of 
a cognitive network where a secondary user can cooperatively 
transmit with the primary user in order to improve the latter's 
effective transmission rate. In return, the secondary user gets 
more opportunities for transmitting its own data when the 
primary user is idle. This kind of interaction between the primary 
and secondary users is different from the traditional dynamic 
spectrum access model in which the secondary users try to avoid 
interfering with the primary users while seeking transmission 
opportunities on vacant primary channels. In our model, the 
secondary users need to balance the desire to cooperate more 
(to create more transmission opportunities) with the need for 
maintaining sufficient energy levels for their own transmissions. 
Such a model is applicable in the emerging area of cognitive 
femtocell networks. We formulate the problem of maximizing 
the secondary user throughput subject to a time average power 
constraint under these settings. This is a constrained Markov 
Decision Problem and conventional solution techniques based 
on dynamic programming require either extensive knowledge of 
the system dynamics or learning based approaches that suffer 
from large convergence times. However, using the technique of 
Lyapunov optimization, we design a novel greedy and online 
control algorithm that overcomes these challenges and is provably 
optimal. 

Index Terms — Resource Allocation, Opportunistic Coopera- 
tion, Cognitive Radio, Femtocell Networks, Optimal Control 

I. Introduction 

Much prior work on resource allocation in cognitive ra- 
dio networks has focused on the dynamic spectrum access 
model H], in which the secondary users seek transmission 
opportunities for their packets on vacant primary channels 
in frequency, time, or space. Under this model, the primary 
users are assumed to be oblivious of the presence of the 
secondary users and transmit whenever they have data to send. 
Secondly, a collision model is assumed for the physical layer 
in which if a secondary user transmits on a busy primary 
channel, then there is a collision and both packets are lost. 
We considered a similar model in our prior work [3 where 
the objective was to design an opportunistic scheduling policy 
for the secondary users that maximizes their throughput utility 
while providing tight reliability guarantees on the maximum 

Rahul Urgaonkar and Michael J. Neely are with the Department 
of Electrical Engineering, University of Southern California, Los An- 
geles, CA 90089. E-mail: urgaonka@usc.edu, mjneely@usc.edu, Web: 
http://www-scf.usc.edu/~urgaonka 

This material is supported in part by one or more of the following: the 
DARPA IT-MANET program grant W91 1NF-07-0028, the NSF Career grant 
CCF-0747525, and continuing through participation in the Network Science 
Collaborative Technology Alliance sponsored by the U.S. Army Research 
Laboratory. 



number of collisions suffered by a primary user over any given 
time interval. We note that this formulation does not consider 
the possibility of any cooperation between the primary and 
secondary users. Further, it assumes that the secondary user 
activity does not affect the primary user channel occupancy 
process. 

There is a growing body of work that investigates alternate 
models for the interaction between the primary and secondary 
users in a cognitive radio network. In particular, the idea of 
cooperation at the physical layer has been considered from an 
information-theoretic perspective in many works (see [4] and 
the references therein). These are motivated by the work on 
the classical interference and relay channels |]5]-|l8]. The main 
idea in these works is that the resources of the secondary user 
can be utilized to improve the performance of the primary 
transmissions. In return, the secondary user can obtain more 
transmission opportunities for its own data when the primary 
channel is idle. 

These works mainly treat the problem from a physical 
layer/information-theoretic perspective and do not consider 
upper layer issues such as queueing dynamics, higher priority 
for primary user, etc. Recent work that addresses some of 
these issues includes ll9l- |[T3l . Specifically, Q considers the 
scenario where the secondary user acts as a relay for those 
packets of the primary user that it receives successfully but 
which are not received by the primary destination. It derives 
the stable throughput of the secondary user under this model. 
[ 10 1, ifTTl use a Stackelberg game framework to study spec- 
trum leasing strategies in cooperative cognitive radio networks 
where the primary users lease a portion of their licensed 
spectrum to secondary users in return for cooperative relaying. 
llT2l . iTTJl study and compare different physical layer strategies 
for relaying in such cognitive cooperative systems. An im- 
portant consequence of this interaction between the primary 
and secondary users is that the secondary user activity can 
now potentially influence the primary user channel occupancy 
process. However, there has been little work in studying this 
scenario. Exceptions include the work in lfl4l that considers a 
two-user setting where collisions caused by the opportunistic 
transmissions of the secondary user result in retransmissions 
by the primary user. 

In this paper, we study the problem of opportunistic co- 
operation in cognitive networks from a network utility maxi- 
mization perspective, specifically taking into account the above 
mentioned higher-layer aspects. To motivate the problem and 
illustrate the design issues involved, we first consider a simple 
network consisting of one primary and one secondary user 
and their respective access points in Sec. [II] This can model 
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a practical scenario of recent interest, namely a cognitive 
femtocell |fl5ll , |fl6ll , as discussed in Sec. [II] We assume that 
the secondary user can cooperatively transmit with the primary 
user to increase its transmission success probability. In return, 
the secondary user can get more opportunities for transmitting 
its own data when the primary user is idle. We formulate the 
problem of maximizing the secondary user throughput subject 
to time average power constraints in Sec. IH-BI 

Unlike most of the prior work on resource allocation in 
cognitive radio networks, the evolution of the system state 
for this problem depends on the control actions taken by 
the secondary user. Here, the system state refers to the 
channel occupancy state of the primary user. Because of 
this dependence, this problem becomes a constrained Markov 
Decision Problem (MDP) and the greedy "drift-plus-penalty" 
minimization technique of Lyapunov optimization [17| that 
we used in is no longer optimal. Such problems are 
typically tackled using Markov Decision Theory and Dynamic 
Programming [23|, \24\. For example, [14| uses these tools to 
derive structural results on optimal channel access strategies 
in a similar two-user setting where collisions caused by the 
opportunistic transmissions of the secondary user cause the 
primary user to retransmit its packets. However, this approach 
requires either extensive knowledge of the dynamics of the un- 
derlying network state (such as state transition probabilities) or 
learning based approaches that suffer from large convergence 
times. 

Instead, in Sec. HID we use the recently developed frame- 
work of maximizing the ratio of the expected total reward over 
the expected length of a renewal frame 11191 — 0211 to design a 
control algorithm. This framework extends the classical Lya- 
punov optimization method ifTTl to tackle a more general class 
of MDP problems where the system evolves over renewals and 
where the length of a renewal frame can be affected by the 
control decisions during that period. The resulting solution has 
the following structure: Rather than minimizing a "drift-plus- 
penalty" term every slot, it minimizes a "drift-plus-penalty 
ratio" over each renewal frame. This can be achieved by 
solving a sequence of unconstrained stochastic shortest path 
(SSP) problems and implementing the solution over every 
renewal frame. 

While solving such SSP problems can be simpler than the 
original constrained MDP, it may still require knowledge of 
the dynamics of the underlying network state. Learning based 
techniques for solving such problems by sampling from the 
past observations have been considered in iPTHl . However, these 
may suffer from large convergence times. Remarkably, in Sec. 
IIV1 we show that for our problem, the "drift-plus-penalty ratio" 
method results in an online control algorithm that does not 
require any knowledge of the network dynamics or explicit 
learning, yet is optimal. In this respect, it is similar to the 
traditional greedy "drift-plus-penalty" minimizing algorithms 
of lfl7l . We then extend the basic model to incorporate multiple 
secondary users as well as time-varying channels in Sec. |VT] 
Finally, we present simulation results in Sec. I VII I 
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Fig. 1. Example femtocell network with primary and secondary users. 

II. Basic Model 

We consider a network with one primary user (PU), one 
secondary user (SU) and their respective base stations (BS). 
The primary user is the licensed owner of the channel while 
the secondary user tries to send its own data opportunistically 
when the channel is not being used by the primary user. This 
model can capture a femtocell scenario where the primary user 
is a legacy mobile user that communicates with the macro base 
station over licensed spectrum (Fig. Q3. The secondary user is 
the femtocell user that does not have any licensed spectrum of 
its own and tries to send data opportunistically to the femtocell 
base station over any vacant licensed spectrum. Similar models 
of cooperative cognitive radio networks have been considered 
in l9l- |[L3l . This can also model a single server queueing 
system with two classes of arrivals where one class has a 
strictly higher priority over the other class. 

We consider a time-slotted model. We assume that the 
system operates over a frame-based structure. Specifically, 
the timeline can be divided into successive non-overlapping 
frames of duration T[k] slots where k S {1,2,3,...} repre- 
sents the frame number (see Fig. |2). The start time of frame k 
is denoted by tk with t\ = 0. The length of frame k is given 
by T[k]=tk+i — tk- For each fc, the frame length T[k] is a 
random function of the control decisions taken during that 
frame. Each frame can be further divided into two periods: 
PU Idle and PU Busy. The "PU Idle" period corresponds to 
the slots when the primary user does not have any packet to 
send to its base station and is idle. The "PU Busy" period 
corresponds to the slots when the primary user is transmitting 
its packets to its base station over the licensed spectrum. As 
shown in Fig. |2j every frame starts with the "PU Idle" period 
which is followed by the "PU Busy" period and ends when 
the primary user becomes idle again. In the basic model, we 
assume that the primary user receives new packets every slot 
according to an i.i.d. Bernoulli arrival process A pu (t) with 
rate \ pu packets/slot. This means that the length of the "PU 
Idle" period of any frame is a geometric random variable with 
parameter X pu . However, the length of the "PU Busy" period 
depends on the secondary user control decisions as discussed 
below. 

In any slot t, if the primary user has a non-zero queue 
backlog, it transmits one packet to its base station. We assume 
that the transmission of each packet takes one slot. If the 
transmission is successful, the packet is removed from the 
primary user queue. However, if the transmission fails, the 
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Fig. 2. Frame-based structure of the problem under consideration. Each 
frame consists of two periods: PU Idle and PU Busy. 

packet is retained in the queue for future retransmissions. The 
secondary user cannot transmit its packets when the channel 
is being used by the primary user. It can transmit its packets 
only during the "PU Idle" period of the frame and must stop 
its transmission whenever the primary user becomes active 
again. However, the secondary user can transmit cooperatively 
with the primary user in the "PU Busy" period to increase 
its transmission success probability. This has the effect of 
decreasing the expected length of the "PU Busy" period. 
In order to cooperate, the secondary user must allocate its 
power resources to help relay the primary user packet. This 
cooperation can take place in several ways depending on the 
cooperative protocol being used (see |[T2l for some examples). 
In this simple model, these details are captured by the resulting 
probability of successful transmission. 

The reason why the secondary user may want to cooperate 
is because this can potentially increase the number of time 
slots in the future in which the primary user does not have 
any data to send as compared to a non-cooperative strategy. 
This can create more opportunities for the secondary user to 
transmit its own packets. However, note that the trivial strategy 
of cooperating whenever possible may lead to a scenario 
where the secondary user does not have enough power for 
its own data transmission. Thus, the secondary user needs to 
decide whether it should cooperate or not considering these 
two opposing factors. 

The probability of a successful primary transmission de- 
pends on the control actions such as power allocation and 
cooperative transmission decisions by the secondary user. This 
is discussed in detail in the next section. In this model, we 
assume that the network controller cannot control the primary 
user actions. However, it can control the secondary user 
decisions on cooperation and the associated power allocation. 

A. Control Decisions and Queueing Dynamics 

Let Q pu (t),Qsu(t) G {0,1,2,...} represent the primary 
and secondary user queues respectively in slot t. New packets 
arrive at the secondary user according to an i.i.d. process 
A su (t) of rate X su packets/slot respectively. We assume that 
there exists a finite constant A max such that A su (t) < A max 
for all t. Every slot, an admission control decision determines 
R su (t), the number of new packets to admit into the secondary 
user queue. Further, every slot, depending on whether the 
primary user is busy or idle, resource allocation decisions 
are made as follows. When Q pu {t) > 0, this represents the 
secondary user decision on cooperative transmission and the 
corresponding power allocation P su (t). When Q pu (t) = 0, 



this corresponds to the secondary user decision on its own 
transmission and the corresponding power allocation P su (t). 
We assume that in each slot, the secondary user can choose 
its power allocation P su (t) from a set V of possible op- 
tions. Further, this power allocation is subject to a long- 
term average power constraint P avg and an instantaneous 
peak power constraint P ma x- For example, V may contain 
only two options {0, P m ax} which represents "Remain Idle" 
and "Cooperate/Transmit at Full Power". As another example, 
V = [0, P m ax] such that P su {t) can take any value between 

and P m ax- 

Suppose the primary user is active in slot t and the sec- 
ondary user allocates power P(t) for cooperative transmission. 
Then the random success/failure outcome of the primary 
transmission is given by an indicator variable fi pu (P(t)) and 
the success probability is given by <j)(P{t)) — E {/j, pu (P(t))}. 
The function <fi(P) is known to the network controller and is 
assumed to be non-decreasing in P. However, the value of the 
random outcome /j, pu (P(t)) may not be known beforehand. 
Note that setting P(t) = corresponds to a non-cooperative 
transmission and the success probability for this case becomes 
<fi(0) and we denote this by <fi nc . Likewise, we denote 4>(P m ax) 
by 4> c . Thus, cf> nc < 4>(P(t)) < 4> c for all P(t) E V. 

We assume that X pu is such that it can be supported even 
when the secondary user never cooperates, i.e., X pu < <f) nc . 
This means that the primary user queue is stable even if there 
is no cooperation. Further, for all k, the frame length T[k] > 1 
and there exist finite constants T m in , T max such that under all 
control policies, we have: 

l<T min <E{T[k]}<T max 

Specifically, T m i n can be chosen to be the expected frame 
length when the secondary user always cooperates with full 
power while T max can be chosen to be the expected frame 
length when the secondary user never cooperates. Using Lit- 
tle's Theorem, we have that: 

Pmin ^ P u 

Pmin ~r~ 1/ X pu (j) c 

Similarly, we have: 

Pmax X pu 
Pmax ~T" 4*nc 

Using these, we have: 

rji A rp A 

^mm=/ i , \ i i ^-max—/, , \, \±) 

\<Pc ^ P u}^ P u \(pnc A p u )A p u 

Finally, there exists a finite constant D such that the expecta- 
tion of the second moment of a frame size, E{T 2 [k]}, satisfies 
the following for all k, regardless of the policy: 

E{P 2 [k}}<D (2) 

This follows from the assumption that the primary user queue 
is stable even if there is no cooperation. In Appendix C, we 
exactly compute such a D that satisfies (0. 

When the primary user is idle in slot t and the secondary 
user allocates power P(t) for its own transmission, it gets a 
service rate given by /j, su (P(t)). This can represent the success 
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probability of a secondary transmission with a Bernoulli 
service process. This can also be used to model more general 
service processes. We assume that there exists a finite constant 
fi max such that n su (P) < fj, max for all P G V . 

Given these control decisions, the primary and secondary 
user queues evolve as follows: 

Q P u{t + 1) = max[Q pu (t) - n pu (P(t)), 0] + A pu (t) (3) 
Q S u{t + 1) = msx[Q su {t) - Hsu{P{t)), 0] + R su {t) (4) 
where R su (t) < A su (t). 

B. Control Objective 

Consider any control algorithm that makes admission con- 
trol decision R su (f) an d power allocation P(t) every slot 
subject to the constraints described in Sec. III-AI Note that 
if the primary queue backlog Q pu (t) > 0, then this power 
is used for cooperative transmission with the primary user. If 
Qpu(t) = 0, then this power is used for the secondary user's 
own transmission. Define the following time-averages under 
this algorithm: 

- 1 t_1 

R SU A lim - VE{i? s „(r)} 

t— too t L — ' 

P su A li m I]Te{P(t)} 

t— >oo t A — ' 

T = 
t-1 

T=0 

where the expectations above are with respect to the potential 
randomness of the control algorithm. Assuming for the time 
being that these limits exist, our goal is to design a joint 
admission control and power allocation policy that maximizes 
the throughput of the secondary user subject to its average and 
peak power constraints and the scheduling constraints imposed 
by the basic model. Formally, this can be stated as a stochastic 
optimization problem as follows: 

Maximize: R su 

Subject to: < R 8U {t) < A su (t) Vi 
P(t) eTVt 

Rsu < H su 

P su — Pavg (5) 

It will be useful to define the primary queue backlog Q pu (t) 
as the "state" for this control problem. This is because the 
state of this queue (being zero or nonzero) affects the control 
options as described before. Note that the control decisions 
on cooperation affect the dynamics of this queue. Therefore, 
problem <(5j is an instance of a constrained Markov decision 
problem |24|. It is well known that in order to obtain an 
optimal control policy, it is sufficient to consider only the class 
of stationary, randomized policies that take control actions 
only as a function of the current system state (and independent 
of past history). A general control policy in this class is 
characterized by a stationary probability distribution over the 
control action set for each system state. Let v* denote the 



optimal value of the objective in (0. Then using standard 
results on constrained Markov Decision problems [24]— [26] , 
we have the following: 

Lemma 1: (Optimal Stationary, Randomized Policy): There 
exists a stationary, randomized policy STAT that takes control 
decisions R 8 ^ 1 (t) , P^ at (t) every slot purely as a (possibly 
randomized) function of the current state Q pu (t) while satis- 
fying the constraints R s su at {t) < A su (t) , P s su at \t) G V for all 
t and provides the following guarantees: 

Rsu = v (6) 

—stat s tat n \ 

U su S/1 S11 (/) 

P S u < Pavg (8) 

where R s g ^ jjEZjJf*, P S S ^ denote the time-averages under this 
policy. 

We note that the conventional techniques to solve © 
that are based on dynamic programming [23] require either 
extensive knowledge of the system dynamics or learning 
based approaches that suffer from large convergence times. 
Motivated by the recently developed extension to the technique 
of Lyapunov optimization in |fT9l - lT2D . we take an different 
approach to this problem in the next section. 

III. Solution Using The "Drift-plus-Penalty" 
Ratio Method 

Recall that the start of the k th frame, tk, is defined as the 
first slot when the primary user becomes idle after the "PU 
Busy" period of the (k — l) th frame. Let Q su (tk) denote 
the secondary user queue backlog at time tk- Also let P(t) 
be the power expenditure incurred by the secondary user in 
slot t. For notational convenience, in the following we will 
denote fJ, S u(P(t)) by /J, su (t) noting the dependence on P(t) 
is implicit. Then the queueing dynamics of Q S u(tk) satisfies 
the following: 

— i 

Qsu(t k +i) < max[Q s „(t fc ) - 2J Ms«(*))0] 

t— tk 

tfc+i-i 
t=t k 

where R su (t) denotes the number of new packets admitted in 
slot t and tk+i denotes the start of the (fc + \) th frame. The 
above expression has an inequality because it may be possible 
to serve the packets admitted in the k th frame during that 
frame itself. 

In order to meet the time average power constraint, we make 
use of a virtual power queue X su (tk) [22| which evolves over 
frames as follows: 

tfc+i-i 

X su (t k+1 ) =nmx[X su (t k )~T[k}P avg + ^ P(t),Q] 

t=t k 

(10) 

where T[k] = tk+i — tk is the length of the k th frame. Recall 
that T[k] is a (random) function of the control decisions taken 
during the k th frame. 
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In order to construct an optimal dynamic control policy, 
we use the technique of [19|-[21| where a ratio of "drift- 
plus-penalty" is maximized over every frame. Specifically, let 
Q(tk) = (Qsu(tk),X su (t k )) denote the queueing state of 
the system at the start of the k th frame. As a measure of 
the congestion in the system, we use a Lyapunov function 
L{Q{t k ))mQ 2 su {t k ) + X 2 su {t k )\. Define the drift A(t fc ) as 
the conditional expected change in L(Q(t k )) over the frame 
k: 

A(t k )M{L(Q(t k+1 )) - L(Q(t k ))\Q(t k )} (11) 

Then, using (|9]l and ([Tol l, we can bound A(t k ) as follows: 
ftfc+i-i ~> 
A(t fe ) < B-Q su (t k )E^ J2 [»su(t)-R su (t)}\Q{t k )\ 

( tfc+i-i -> 

- X su {t k )E \ T[k]P avg - P&Mtk)} ( 12 > 



where B is a finite constant that satisfies the following for all 
k and Q(t k ) under any control algorithm: 



tfc+i-i 



b> y ( ^ /i su w) 2 +( ^ i? su w 
i t=t fc ' t=t k 

tk+l — 1 2 

+ ( 51 J'W-^Www) 



Using the fact that /x su (t) < fJ. ma x,P(t) < P-max for all t, 
and using the fact (|2), it follows that choosing B as follows 
satisfies the above: 



B 



Di^max + ^max + (Pnax Prog) 2 ] 



(13) 



Adding a penalty term -Ve{x^1 1 -R.u(t)|Q(**)} 
(where V > is a control parameter that affects a utility- 
delay trade-off as shown in Theorem [TJ to both sides and 
rearranging yields: 

A(t fc ) - UE <^ ^ Rsu(t)\Q(t k ) \ <B + (Q„(t fc ) - V) 

{tfc+i-i ~\ 
P stl (*)|Q(i fc )j -X s „(t fe )E{T[A;]P Wff |Q(t fc )} 

-e\ J2 (Qsu(t k )»su(t)-x su (t k )P(tj)\Q(t k )\ 



(14) 

Minimizing the ratio of an upper bound on the right hand 
side of the above expression and the expected frame length 
over all control options leads to the following Frame-Based- 
Drift-Plus-Penalty-Algorithm. In each frame k e {1, 2,3,.. .}, 
do the following: 

1) Admission Control: For all t 6 {t k , t k + 1, . . . , t k +i — 1}, 
choose R su (t) as follows: 

R„m - { f' m l?*" (f) - v as) 



2) Resource Allocation: Choose a policy that maximizes the 
following ratio: 



E 



{Etsr 1 (QsM»su(t) - x su (t k )p(tj)\Q(t k )} 



E{T[k]\Q{t k )} 



(16) 



Specifically, every slot t of the frame, the policy observes 
the queue values Q su {tk) an d X su (t k ) at the beginning 
of the frame and selects a secondary user power P(t) 
subject to the constraint P(t) £ V and the constraint 
on transmitting own data vs. cooperation depending on 
whether slot t is in the "PU Idle" or "PU Busy" period 
of the frame. This is done in such a way that the above 
frame-based ratio of expectations is maximized. Recall 
that the frame size T[k] is influenced by the policy 
through the success probabilities that are determined by 
secondary user power selections. Further recall that these 
success probabilities are different during the "PU Idle" 
and "PU Busy" periods of the frame. An explicit policy 
that maximizes this expectation is given in the next 
section. 

3) Queue Update: After implementing this policy, update the 
queues as in (|4]i and (fTOb . 

From the above, it can be seen that the admission control 
part (fT~5T > is a simple threshold-based decision that does not 
require any knowledge of the arrival rates X su or X pu , In the 
next section, we present an explicit solution to the maximizing 
policy for the resource allocation in (Q~6) and show that, re- 
markably, it also does not require knowledge of X su or X pu and 
can be computed easily. We will then analyze the performance 
of the Frame-Based-Drift-Plus-Penalty-Algorithm in Sec. [V] 



IV. The Maximizing Policy of (1161 ) 

The policy that maximizes ( fT6l ) uses only two numbers that 
we call Pq and P*, defined as follows. P ( * is given by the 
solution to the following optimization problem: 



Maximize: Q su {t k )n su {Po) ~ X su (t k )P 
Subject to: P e V 



(17) 



Let 9*=Q su {t k )n su (P^) ~ X su (t k )Po denote the value of the 
objective of ( fTTI i under the optimal solution. Then, P£ is given 
by the solution to the following optimization problem: 

9* + X su (t k )P 1 



Minimize: 



{Pi) 



Subject to: PjGP 



(18) 



Note that both (TTTb and ( TT8l are simple optimization problems 
in a single variable and can be solved efficiently. Given Pq 
and P*, on every slot t of frame k, the policy that maximizes 
(TT6b chooses power P(t) as follows: 



P(t) = 



PJ iSQpu(t)=0 
Pi if Q pu (t)>0 



(19) 



That is, the secondary user uses the constant power P * for 
its own transmission during the "PU Idle"period of the frame, 
and uses constant power Pj* for cooperative transmission 



during all slots of the "PU busy"period of the frame. Note 
that P * and Pj* can be computed easily based on the weights 
Qsu(tk), X su (tk) associated with frame k, and do not require 
knowledge of the arrival rates A su ,A pu . 

Our proof that the above decisions maximize JTol has 
the following parts: First, we show that the decisions that 
maximize the ratio of expectations in (TToT l are the same as 
the optimal decisions in an equivalent infinite horizon Markov 
decision problem (MDP). Next, we show that the solution to 
the infinite horizon MDP uses fixed power Pi for each queue 
state Q pu {t) = i (for i G {0, 1,2,.. .}). Then, we show that Pi 
are the same for all i > 1. Finally, we show that the optimal 
powers P * and Pf are given as above. The detailed proof is 
given in the next section. 

A. Proof Details 

Recall that the Frame-Based-Drift-Plus-Penalty-Algorithm 
chooses a policy that maximizes the following ratio over every 
frame k G {1,2,3, . . .} 

{E^Sr 1 (Q~(*k)A*~(*) - *~(*k)-P(*)) \Q(tk)} 

E{T[k]\Q(t k )} 

(20) 

subject to the constraints described in Sec. [II] Here we examine 
how to solve (f20b in detail. First, define the state i in any slot 
t G {tfe,tfc + l,...,tfc+i — 1} as the value of the primary 
user queue backlog Q pu (t) in that slot. Now let 1Z denote the 
class of stationary, randomized policies where every policy 
r G 1Z chooses a power allocation Pi(r) G V in each state i 
according to a stationary distribution. It can be shown that it 
is sufficient to only consider policies in 1Z to maximize d20l >. 
Now suppose a policy r G 1Z is implemented on a recurrent 
system with fixed Q su (tk) and X sa (tk) and with the same 
state dynamics as our model. Note that fi su (t) = for all 
t when the state i > 1. Then, by basic renewal theory [27 1, 
we have that maximizing the ratio in (f20b is equivalent to the 
following optimization problem: 

Maximize: Q s „(t fc )E {/x s „(P (r))} n {r) 



E 



Subject to: r G 1Z 



(21) 



where iTi (r) is the resulting steady-state probability of being in 
state i in the recurrent system under the stationary, randomized 
policy r and where the expectations above are with respect to 
r. Note that well-defined steady-state probabilities 7r.j (r) exist 
for all r G 7Z because we have assumed that X pu < <fi nc so 
that even if no cooperation is used, the primary queue is stable 
and the system is recurrent. Thus, solving (|2D1 is equivalent to 
solving the unconstrained time average maximization problem 
(|2"TT > over the class of stationary, randomized policies. Note that 
(I2TI 1 is an infinite horizon Markov decision problem (MDP) 
over the state space i G {0,1,2,...}. We study this problem 
in the following. 

Consider the optimal stationary, randomized policy that 
maximizes the objective in $2l[ . Let Xi denote the probability 




Fig. 3. Birth-Death Markov Chain over the system state where the system 
state represents the primary user queue backlog. 



distribution over V that is used by this policy to choose a 
power allocation Pi in state i. Let /Zj denote the resulting 
effective probability of successful primary transmission in 
state i > 1. Then we have that [ii = E Xi {0(P;)} where 
4>{Pi) denotes the probability of successful transmission in 
state i when the secondary user spends power Pi in cooperative 
transmission with the primary user. Since the system is stable 
and has a well-defined steady-state distribution, we can write 
down the detail equations for the Markov Chain that describes 
the state transitions of the system as follows (See Fig. |3): 

TTuApu = 7Ti(l — Xpu)Hl 
7TlApu(l - Ml) = 7T 2 (1 - \ pu )H2 

7TiAp„(l - ^i) = 7T. i+ l(l - A pu )^. i+ l Vi > 1 

where 7r,; denotes the steady-state probability of being in state 
i under this policy. Summing over all i yields: 



Aym — ^ ^ T^if^i 
i>l 



(22) 



The average power incurred in cooperative transmissions under 
this policy is given by: 



P = 5>iE X4 {P»} 



(23) 



i>i 



Now consider an alternate stationary policy that uses the 
following fixed distribution \' for choosing control action P' 
in all states i > 1: 



' xi w i tn probability ^ ^ — 
X2 with probability ^ J2 — 



(24) 



Xi with probability ^ 7Ti — 



Let [i! denote the resulting effective probability of a suc- 
cessful primary transmission in any state i > 1. Note that this 
is same for all states by the definition (l24l . Then, we have 
that: 



i j i 



i>i 



E 3 >i n i 



(25) 



Let -k\ denote the steady-state probability of being in state i 
under this alternate policy. Note that the system is stable under 
this alternate policy as well. Thus, using the detail equations 
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for the Markov Chain that describes the state transitions of the 
system under this policy yields 



Ar, 



E = E < ( E 7T 

fc>i fc>i i>i ^j>i 3 



rearranging the objective in (130} and ignoring the constant 
terms, we have the following equivalent problem: 

+ X SM (t fe )E{P 1 (r)} 



E^ 



£,->i ^ 



E4( 



A 



(26) 



fe>i '—'isl j fe >j 

where we used d22l in the last step. This implies that 
Efc>i 7r fe = Ej>i an d therefore 7r = 7To. Also, the 
average power incurred in cooperative transmissions under this 
alternate policy is given by: 

p = e*{p'} =E4(E ] M p a; ;; ' 



/t>i 



E 

k>l 



P 



E,->i *j 



fe>i 



= p 



E,>i ^ 



(27) 



where we used (|23l in the second last step and Efc>i 77 k = 

Ej>i 71 j m me l ast ste P- 

Thus, if we choose \' = Xo in state i = and choose 
x' as defined in d24l > in all other states, it can be seen that 
the alternate policy achieves the same time average value of 
the objective ( 1271 ) as the optimal policy. This implies that to 
maximize (|2TT >. it is sufficient to optimize over the class of 
stationary policies that use the same distribution for choosing 
Pi for all states i > 1. Denote this class by TV . Then for all 
% > 1, we have that E{P(r)} = E{Pj(r)} for all r £ ft'. 
Using this and the fact that 1 — n^(r) = Ei>i 7r i( r )^ *ED can 
be simplified as follows: 

Maximize: [Q s „(t fc )E{/i su (P (r))} - X su (t k )E{P (r)}]7r (r) 

-X SM (< fe )E{P 1 (r)}(l- 7 r (r)) 
Subject to: r eW (28) 

where ttq (r) is the resulting steady-state probability of being 
in state and where E{Pi(r)} is the average power incurred 
in cooperative transmission in state i = 1 (same for all states 
i > 1). Next, note that the control decisions taken by the 
secondary user in state i = do not affect the length of the 
frame and therefore 7To(r). Further, the expectations can be 
removed. Therefore the first term in the problem above can be 
maximized separately as follows: 



Maximize: Q S u{t k )^ su {P ) 
Subject to: P e V 



(29) 



This is the same as ( TTVb . Let PJ denote the optimal solution 
to d22> and let 9* = Q S u{t k )lJ,su{Po) - X su (t k )P£ denote 
the value of the objective of ( 1291 under the optimal solution. 
Note that we must have that 9* > because the value of the 
objective when the secondary user chooses Pq = (i.e., stays 
idle) is 0. Then, d28l can be written as: 



Maximize: 
Subject to: 



0*Mr) 



X su (i fe )E{P 1 (r)}(l-7r (r)) 



(30) 



The effective probability of a successful primary transmission 
in any state i > 1 is given by ¥,{<fi(Pi(r))}. Using Little's 
Theorem, we have 7r (Y) = 1 - E {0(p"(r))} ' Usin S ^ s and 



Minimize: ■ 

E{0(Pi(r))} 

Subject to: r £ 1Z' 



(31) 



It can be shown that it is sufficient to consider only determin- 
istic power allocations to solve (|3TT > (see, for example, [21. 
Section 7.3.2]). This yields the following problem: 



Minimize: 



9* + X su (t k )P 1 



Subject to: Pi £ V 



(32) 



This is the same as ( TT8l . Note that solving this problem does 
not require knowledge of X pu or \ su and can be solved easily 
for general power allocation options V . We present an example 
that admits a particularly simple solution to this problem. 

Suppose V = {0, P ln ax} so that the secondary user can 
either cooperate with full power P max or not cooperate (with 
power expenditure 0) with the primary user. Then, the optimal 
solution to d32b can be calculated by comparing the value of 
its objective for Pi £ {0, P ma x]- This yields the following 
simple threshold-based rule: 



HX su (t k )>^ 



■0 



(33) 



We also note that this threshold can be computed without any 
knowledge of the input rates X pu , X su . 

To summarize, the overall solution to (fTSl is given by 
the pair (P *,Pi) where Pq denotes the power allocation 
used by the secondary user for its own transmission when 
the primary user is idle and P* denotes the power used by 
the secondary user for cooperative transmission. Note that 
these values remain fixed for the entire duration of frame 
k. However, these can change from one frame to another 
depending on the values of the queues Q S u(tk), X su (tt)- The 
computation of (Pq,P^) can be carried out using a two-step 
process as follows: 

1) First, compute Pq* by solving problem (1291 1. Let 6* be the 
value of the objective of ( 1291 under the optimal solution 

2) Then compute P* by solving problem (l32l . 

It is interesting to note that in order to implement this 
algorithm, the secondary user does not require knowledge of 
the current queue backlog value of the primary user. Rather, it 
only needs to know the values of its own queues and whether 
the current slot is in the "PU Idle" or "PU Busy" part of the 
frame. This is quite different from the conventional solution to 
the MDP © which is typically a different randomized policy 
for each value of the state (i.e., the primary queue backlog). 

V. Performance Analysis 

To analyze the performance of the Frame-Based-Drift- 
Plus-Penalty-Algorithm, we compare its Lyapunov drift with 
that of the optimal stationary, randomized policy STAT of 
Lemma Q] First, note that by basic renewal theory [27], the 



x 



performance guarantees provided by STAT hold over every 
frame k G {1,2,3,...}. Specifically, let % be the start of 
the k th frame. Suppose STAT is implemented over this frame. 
Then the following hold: 



tk+1 — 


1 


E 


Rf:\t) 


t=t k 






i 


E 




t—tk 




tk+i — 


1 


E 




t—tk 





:{f[k]}v* 



avg 



(34) 



(35) 



(36) 



where t^+i and T[k] denote the start of the (k + l) th frame 
and the length of the k th frame, respectively, under the 
policy STAT. Similarly, Rf u at (t) , Pf u a] \t) , (ifu* (*) denote ^ 
resource allocation decisions under STAT. 

Next, we define an alternate control algorithm ALT that will 
be useful in analyzing the performance of the Frame-Based- 
Drift-Plus-Penalty-Algorithm. 

Algorithm ALT: In each frame k e {1,2,3,...}, do the 
following: 

1) Admission Control: For all t S {tk, tk + 1, • • • , ifc+i — 1}, 
choose R su (t) as follows: 



Rsu(t) 



A su (£) 





if Q su (t k ) < V 
else 



(37) 



2) Resource Allocation: Choose a policy that maximizes the 
following ratio: 

^{Ettr 1 (Qsu(t k )nsu(t) -x au (t k )P(t)}\Q(t k )} 



E{T[k]\Q(t k )} 



(38) 



3) Queue Update: After implementing this policy, update the 
queues as in (0, (Mol l, 

By comparing with the Frame-Based-Drift-Plus-Penalty- 
Algorithm, it can be see that this algorithm differs only in the 
admission control part while the resource allocation decisions 
are exactly the same. Specifically, under ALT, the queue 
backlog Qsu(tk) at the start of the k th frame is used for 
making admission control decisions for the entire duration 
of that frame. However, under the Frame-Based-Drift-Plus- 
Penalty- Algorithm, the queue backlog Q su (t) at the start of 
each slot is used for making admission control decisions. 
Note that since the length of the frame depends only on the 
resource allocation decisions and they are the same under the 
two algorithms, it follows that implementing them with the 
same starting backlog Q{tk) yields the same frame lengths. 

The following lemma compares the value of the second 
term in the Lyapunov drift bound (fl4l that corresponds to 
the admission control decisions under these two algorithms. 

Lemma 2: Let i?^ b (i) and Rsu(t) denote the admis- 
sion control decisions made by the Frame-Based-Drift-Plus- 
Penalty-Algorithm and the ALT algorithm respectively for all 



t £ {tk, tk + 1, • ■ • , ife+i — 1}. Then we have: 
E| E (Qsu(tk)-V)Rf u t (t)\Q(tk)j 

> E | E (QM - V)R{Z b (t)\Q(tk)j - C (39) 

where (j= D<,Ama:c+ '^ rna ^ Amax is a constant that does not 
depend on V. 

Proof: See Appendix A. ■ 
We are now ready to characterize the performance of the 
Frame-Based-Drift-Plus-Penalty-Algorithm. 

Theorem 1: (Performance Theorem) Suppose the Frame- 
Based-Drift-Plus-Penalty-Algorithm is implemented over all 
frames fc G {1,2,3,...} with initial condition Q stl (0) = 
0,X SU (0) = and with a control parameter V > 0. 
Let n{u h {t),Piu h {t) denote the resource allocation decisions 
under this algorithm. Then, we have: 

1) The secondary user queue backlog Q su (t) is upper 
bounded for all t: 

^ su 

(*)< + V (40) 

2) The virtual power queue X su (tk) is mean rate stable, i.e., 

(41) 



Dm E{ ^ )} =0 

K^oo K 



Further, we have: 



K— >oo 



limsupfljSJ E ( P L ab (t)-Pav 9 )\ I <<> 

V fe=i I t=t k ) ) 

(42) 



lim sup 



< P 



(43) 



3) The time-average secondary user throughput (defined 
over frames) satisfies the following bound for all K > 0: 



where B 



> V 



MPrr 



B + C 
VT 

v mm 



(44) 



and C 



Theorem Q] shows that the time-average secondary user 
throughput can be pushed to within 0(1 /V) of the optimal 
value with a trade-off in the worst case queue backlog. By 
Little's Theorem, this leads to an 0(1/ V,V) utility-delay 
tradeoff. 

Proof: Part (1): We argue by induction. First, note that 
(l40b holds for t = 0. Next, suppose Q su (t) < Qmax for some 
t > 0. We will show that Q su (t + 1) < Qmax- We have two 
cases. First, suppose Q su (t) < V. Then, by (O, the maximum 
that Q su (t) can increase is A max so that Q su (t+1) < A max + 
V — Qmax- Next, suppose Q su (t) > V. Then, the admission 
control decision (TT~5T > chooses R su (t) = 0. Thus, by ©, we 
have that Q su (t + 1) < Q S u(t) < Qmax for this case as well. 
Combining these two cases proves the bound (l40l . 

Parts (2) and (3): See Appendix B. ■ 
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VI. Extensions to Basic Model 
We consider two extensions to the basic model of Sec. [IT] 

A. Multiple Secondary Users 

Consider the scenario with one primary user as before, 
but with N > 1 secondary users. The primary user channel 
occupancy process evolves as before where the secondary 
users can transmit their own data only when the primary user 
is idle. However, they may cooperatively transmit with the 
primary user to increase its transmission success probability. In 
general, multiple secondary users may cooperatively transmit 
with the primary in one timeslot. However, for simplicity, here 
we assume that at most one secondary user can take part in 
a cooperative transmission per slot. Further, we also assume 
that at most one secondary user can transmit its data when the 
primary user is idle. 

Our formulation can be easily extended to this scenario. Let 
Vi denote the set of power allocation options for secondary 
user i. Suppose each secondary user i is subject to average and 



peak power constraints P aV g.i and P„ 



respectively. Also, 



let <f>i{P) denote the success probability of the primary trans- 
mission when secondary user i spends power P in cooperative 
transmission. Now consider the objective of maximizing the 
sum total throughput of the secondary users subject to each 
user's average and peak power constraints and the scheduling 
constraints of the model. In order to apply the "drift-plus- 
penalty" ratio method, we use the following queues: 

t=t k t=t k 

(45) 



Qi(tk+i) < max[Q 2 (t fc ) - Hi(t),0] 



Xi(t k+ i) = Tsiax[Xi{t k ) - T[k]P aV g ti + 



E 

t=t k 



(46) 



where Qi(tk) is the queue backlog of secondary user i at 
the beginning of the k th frame, fii(t) is the service rate of 
secondary user i in slot t, Ri(t) and Pi{t) denote the number 
of new packets admitted and the power expenditure incurred 
by the secondary user i in slot t. Finally, t k +i denotes the 
start of the (k + l) th frame and T[k] — ifc+i — t k is the length 
of the k th frame as before. 

Let Q(t fc ) = (Qi{t k ),.. .,Q N (tk),Xi(t k ), . . .,X N (t k )) 
denote the queueing state of the system at the 
start of the k th frame. Using a Lyapunov function 

£(Q(**))^[£jIiQ?(**) + E^i *?(**)] and following 
the steps in Sec. [Ill] yields the following Multi-User 
Frame-Based-Drift-Plus-Penalty-Algorithm. In each frame 
k G {1, 2,3,.. .}, do the following: 
1) Admission Control: For all t G {t k , t k + 1, . . . , ifc+i — 1}, 

for each secondary user i G {1, 2, . . . , N}, choose Ri(t) 

as follows: 

Ai(t) if Qi(t)<V 
else 



Ri{t) = 



(47) 



where Ai(t) is the number of new arrivals to secondary 
user i in slot t. 



2) Resource Allocation: Choose a policy that maximizes the 
following ratio: 

££i e {Yt&rHQiitk)* (*) XiQjPi (t))\Q(t k )} 



E{T[k]\Q(t k )} 



(48) 



3) Queue Update: After implementing this policy, update the 
queues as in d45l ) and d46l ). 

Similar to the basic model, this algorithm can be implemented 
without any knowledge of the arrival rates A,; or \ pu . Further, 
using the techniques developed in Sec. |IV| it can be shown 
that the solution to d48l i can be computed in two steps as 
follows. First, we solve the following problem for each i G 
{1.2 V]: 



Maximize: Q i (t fe )// i (P) - Xi(t k )P 
Subject to: P G Vi 



(49) 



Let Pq denote the optimal solution to ( |49l achieved by user i* 
and let 0* denote the optimal objective value. This means user 
i* transmits on all idle slots of frame k with power Pq. Next, 
to determine the optimal cooperative transmission strategy, we 
solve the following problem for each i & {1,2, ... , N}: 



Minimize: 



Xi(t k )P 



i(P) 



Subject to: P G "Pi 



(50) 



Let P* denote the optimal solution to ( T50b achieved by user 
j*. This means user j* cooperatively transmits on all busy 
slots of frame k with power P*. 



B. Fading Channels 

Next, suppose there is an additional channel fading process 
S(t) that takes values from a finite set S in an i.i.d fashion 
every slot. We assume that in every slot, Prob[5(t) = s] = q s 
for all s G S. The success probability with cooperative trans- 
mission now is a function of both the power allocation and the 
fading state in that slot. Specifically, suppose the primary user 
is active in slot t and the secondary user allocates power P(t) 
for cooperative transmission. Also suppose S(t) = s. Then the 
random success/failure outcome of the primary transmission 
is given by an indicator variable p, pu (P(t), s) and the success 
probability is given by <fr s (P(t)) — E {fi pu (P(t), s)}. The 
function <f) s {P) is known to the network controller for all 
s 6 S and is assumed to be non-decreasing in P for each 
s G S. For simplicity, we assume that the secondary user 
transmission rate n su (t) depends only on P(t). 

By applying the "drift-plus-penalty" ratio method to this 
extended model, we get the following control algorithm. The 
admission control remains the same as ([T5V The resource 
allocation part involves maximizing the ratio in ([TBI . Using 
the same arguments as before in Sec. [IV] it can be shown 
that maximizing this ratio is equivalent to the following 



10 



optimization problem: 

Max: Q su (tk)E{^ su (Po(r))}7r (r) - X su (t k )E{P Q (r)}no(r) 
- X su (t k ) E E E i p iA r )}*iAr) 



where we used ((55) in the last step. Since S(t) is i.i.d., for 
any si, S2 G S, we have that 



i>i seS 
Subject to: r G TZ 



(51) 



where 7r,. s (r) is the resulting steady-state probability of being 
in state (i, s) in the recurrent system under the stationary, 
randomized policy r and where the expectations above are 
with respect to r. We study this problem in the following. 

Consider the optimal stationary, randomized policy that 
maximizes the objective in (IBTt . Let Xi,s denote the probability 
distribution over V that is used by this policy to choose a 
control action P^ a in state (i,s). Let /i^ = E Xi s {<f> s {Pi.s)} 
denote the resulting effective probability of successful primary 
transmission in state (i, s) where i > 1. Since the system 
is stable under any stationary policy, total incoming rate = 
total outgoing rate. Thus, we get: 



i>i ses 



where 7Tj lS denotes the steady-state probability of being in state 
(i, s) under this policy. Note that the system is stable and 
has a well-defined steady-state distribution. The average power 
incurred in cooperative transmissions under this policy is given 
by: 



(53) 



i>l seS 



Now consider an alternate stationary policy that, for each 
s G S, uses the following fixed distribution x' s f° r choosing 
control action P' s in all states (i, s) where i > 1: 



Xl.s 
X2,s 



with probability ^ 7ri,a 
with probability ^ W2, a 

with probability ^ n '' s 



(54) 



For each s G S, let fi' s denote the resulting effective probability 
of a successful primary transmission in any state (i, s) where 
i > 1 under this policy. Note that this is same for all states 
(i,s) where i > 1 by the definition (l54l . Then, we have that: 



i> 



1 2J 3 ->i *: 



Let 7r^ s denote the steady-state probability of being in state 
{i, s) under this alternate policy. Since the system is stable un- 
der any stationary policy, total incoming rate = total outgoing 
rate. Thus, we get: 



A,: 



E E n 'k,^s = E/m E 

sS5fe>l seS \k>l ) 



E 

ses 



E^ 



' J l,S 




(56) 



noqs2 



Similarly, we have: 

n'olsi + E n 'i,si = 1»U 7r o9s2 

Using this, for any s\, S2 G S, we have: 

Ej>1 _ Ej>1 ^,^2 

Ej^l^j.sl Ej>1 n 'j,s2 

, we have for each s£iS: 
Efc>i 



E n i> s2 



E ^-.-a 



<7s2 



q S 2 



(57) 



Using this in 

A™ 



E E Mi.sTi.s 

ses £>i 



E 



A„ 



E 



/c>i "fc.i 



3>1 ^J.s 



E 



j>l 



(58) 



where we used 



(52) E 



k>l n k,s 



= E 



in the last step. This implies that 
for every s G S and therefore 



7Tq = 7To. Also, the average power incurred in cooperative 
transmissions under this alternate policy is given by: 

p' = zZJ2<MPs} 



k>i seS 



= EE 7 C 

fc>i .ses 

-EE** 

seS i>l 



, i>l 



E Xi , s {Pi, s } 



E 



J>1 ^Ji* 



(59) 



where we used the fact that Efe>i n 'k s ~ Ej>i n j,s f° r a ^ 
s. Thus, if we choose x' — Xo m state i — and choose 
x' s as defined in d54l i in all states (i, s) where i > 1, it 
can be seen that the alternate policy achieves the same time 
average value of the objective (IBTt as the optimal policy. This 
implies that to maximize ( Bil l, it is sufficient to optimize over 
the class of stationary policies that, for each s G S, use the 
same distribution for choosing P^ s for all states (i, s) where 
i > 1. Denote this class by TZ' . Using this and the fact that 
Ei>i ^mM = (1 ~ 7r o( r ))'?s for all s, d5TT ) can be simplified 
as follows: 

Maximize: [Q su (t k )E{[i su {P (r))} - X su {t k )E {P (r)}}n {r) 



- X su (t k )}^E{P s (r)} (1 - Mr))q s 
ses 



( 55 ) Subject to: r G TZ' 



(60) 



where ttq (r) is the resulting steady-state probability of being 
in state and where E{P s (r)} is the average power incurred 
in cooperative transmission in any state (i, s) with i > 1. 
Using the same arguments as before, the solution to (l60t can 
be obtained in two steps as follows. We first compute the 
solution to d29b as before. Denoting its optimal value by 9*, 
d60b can be written as: 

Maximize: 0*7T O (r) - X su (t k )Y E {P s (r)} (1 - 7r (r))g s 



Subject to: r £ TZ' 



(61) 
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Fig. 4. Average Secondary User Throughput vs. V. 



Fig. 5. Average Secondary User Queue Occupancy vs. V. 



Using Little's Theorem, we have TTo(r) = 1 — 

£ q 3 E{0 a (P a (r))} - Using this and rearranging the objective 

in d6lT ) and ignoring the constant terms, we have the following 
equivalent problem: 



Maximize: 



-9* -X su (t k )J2 s£S q s E{Ps(r)} 



EseslMMPsir))} 



Subject to: r G TZ 1 



(62) 



It can be shown that it is sufficient to consider only determin- 
istic power allocations to solve (l62l (see, for example, [21. 
Section 7.3.2]). This yields the following problem: 

—9* — X su (tk) J2 se s IsPs 



Maximize: 
Subject to: P s 6 V for all s e S 



(63) 



Note that solving this problem does not require knowledge of 
X pu or X su and can be solved efficiently for general power 
allocation options V. 

VII. Simulations 

In this section, we evaluate the performance of the 
Frame-Based-Drift-Plus-Penalty-Algorithm using simulations. 
We consider the network model as discussed in Sec. [II] with 
one primary and one secondary user. The set V consists of 
only two options {0, P m ax}- We assume that P avg = 0.5 and 
Pmax = 1- We set <fi nc = 0.6 and cj> c = 0.8. For simplicity, 
we assume that fi su (P m ax) = 1. 

In the first set of simulations, we fix the input rates 
X pu = X su =0.5 packets/slot. For these parameters, we can 
compute the optimal offline solution by linear programming. 
This yields the maximum secondary user throughput as 0.25 
packets/slot. We now simulate the Frame-Based-Drift-Plus- 
Penalty-Algorithm for different values of the control parameter 
V over 1000 frames. In Fig. |4] we plot the average throughput 
achieved by the secondary user over this period. It can be seen 
that the average throughput increases with V and converges 
to the optimal value 0.25 packets/slot, with the difference 
exhibiting a 0(1/V) behavior as predicted by Theorem Q] 
In Fig. [5] we plot the average queue backlog of the secondary 
user over this period. It can be see that the average queue 
backlog grows linearly in V, again as predicted by Theorem Q] 
Also, for all V, the average secondary user power consumption 
over this period was found not to exceed P avg = 0.5 units/slot. 



For comparison, we also simulate three alternate algorithms. 
In the first algorithm "No Cooperation", the secondary user 
never cooperates with the primary user and only attempts to 
maximize its throughput over the resulting idle periods. The 
secondary user throughput under this algorithm was found to 
be 0.166 packets/slot as shown in Fig. [4] Note that using 
Little's Theorem, the resulting fraction of time the primary 
user is idle is 1 — \ pu /4>nc = 1 — 0.5/0.6 = 0.166. This 
limits the maximum secondary user throughput under the "No 
Cooperation" case to 0.166 packets/slot. 

In the second algorithm, we consider the "Always Cooper- 
ate" case where the secondary user always cooperates with the 
primary user. For the example under consideration, this uses 
up all the secondary user power and thus, the secondary user 
achieves zero throughput. 

In the third algorithm "Counter Based Policy", a running 
average of the total secondary user power consumption so 
far is maintained. In each slot, the secondary user decides 
to transmit/cooperate only if this running average is smaller 
than Pavg- The maximum secondary user throughput under 
this algorithm was found to be 0.137 packets/slot. This demon- 
strates that simply satisfying the average power constraint is 
not sufficient to achieve maximum throughput. For example, 
it may be the case that under the "Counter Based Policy", the 
running average condition is usually satisfied when the primary 
user is busy. This causes the secondary user to cooperate. 
However, by the time the primary user next becomes idle, the 
running average exceeds P avg so that the secondary user does 
not transmit its own data. In contrast, the Frame-Based-Drift- 
Plus-Penalty-Algorithm is able to find the opportune moments 
to cooperate/transmit optimally. 

In the second set of simulations, we fix the input rate 
X su = 0.8 packets/slot, V — 500, and simulate the Frame- 
Based-Drift-Plus-Penalty-Algorithm over 1000 frames. At the 
start of the simulation, we set X pu = 0.4 packets/slot. The val- 
ues of the other parameters remain the same. However, during 
the course of the simulation, we change X pu to 0.2 packets/slot 
after the first 350 frames and then again to 0.55 packets/slot 
after the first 700 frames. In Figs. |6]and|7] we plot the running 
average (over 100 frames) of the secondary user throughput 
and the average power used for cooperation. These show that 
the Frame-Based-Drift-Plus-Penalty-Algorithm automatically 
adapts to the changes in X pu . Further, it quickly approaches 
the optimal performance corresponding to the new X pu by 



12 




Frame Number Frame Number 



Fig. 6. Moving Average of Secondary User Throughput over Frames. 

adaptively spending more or less power (as required) on co- 
operation. For example, when X pu reduces to 0.2 packets/slot 
after frame number 350, the fraction of time the primary is 
idle even with no cooperation is 1 — 0.2/0.6 = 0.66. With 
Pavg = 0.5, there is no need to cooperate anymore. This is 
precisely what the Frame-Based-Drift-Plus-Penalty-Algorithm 
does as shown in Fig. [7] Similarly, when when X pu increases to 
0.55 packets/slot after frame number 700, the Frame-Based- 
Drift-Plus-Penalty-Algorithm starts to spend more power on 
cooperative transmissions. 

VIII. Conclusions 

In this paper, we studied the problem of opportunistic 
cooperation in a cognitive femtocell network. Specifically, 
we considered the scenario where a secondary user can 
cooperatively transmit with the primary user to increase its 
transmission success probability. In return, the secondary user 
can get more opportunities for transmitting its own data when 
the primary user is idle. A key feature of this problem is that 
here, the evolution of the system state depends on the control 
actions taken by the secondary user. This dependence makes it 
a constrained Markov Decision Problem traditional solutions 
to which require either extensive knowledge of the system 
dynamics or learning based approaches that suffer from large 
convergence times. However, using the technique of Lyaunov 
optimization, we designed a novel greedy and online control 
algorithm that overcomes these challenges and is provably 
optimal. 

Appendix A 
Proof of Lemma[2] 

Let Q{u b {t) denote the queue backlog value under 
the Frame-Based-Drift-Plus-Penalty-Algorithm for all t G 
{tk, • ■ • , tk+i— !}■ Then, since the admission control de- 

cision ( 1151 of the Frame-Based-Drift-Plus-Penalty-Algorithm 
minimizes the term (Q su (t) — V)R su (t) for all Q su (t), we 
have: 



Fig. 7. Moving Average of Power used by the Secondary User for 
Cooperative Transmissions over Frames. 

Note that we are not implementing the admission control 
decisions of ALT in the left hand side of the above. 

Next, we make use of the following sample path relations 
in 601 to prove For all t e {**,**, + 1, ... , t k +i - 1}, 
the following hold under any control algorithm: 



Qsu{tk) > Qsu(t) — (t — t k )A % 
Qsu(f k ) < Qsu{t) + (t — t k )jJL„ 



(65) 
(66) 



follows by noting that the maximum number of arrivals 
to the secondary user queue in the interval [t k , ■ ■ • , t) is at 
most (t — t k )A max . Similarly, (|66] l follows by noting that 
the maximum number of departures from the secondary user 
queue in the interval [t k , ■ ■ . , t) is at most (i — t k )/j, max . 
Using d65l l in the left hand side of d64b yields: 

(th+i—i ~\ 



e E (Q f su b (t)~v)Rf:(t)m k ) \ < 

£ J2 (Qsu(t k )-V)Rf u t (t)\Q(t k )\ 
I t=t k ) 

+ E<^ £ {t-t k )A max Rf*{t)\Q{t k )\ 



Using the fact that Rf*(t) < A max and Y!t=t\ ^ - **) 

T[k\{T[k\-l) 



~2 , we get: 



E (Qifit) ~v)R a su (t)\Q(h) < 



. t=t k 



DAI 



E E (Qsu(tk)-V)Rf*(t)\Q(t k ) 



Next, using ( |66T > in the right hand side of ( f64l > yields: 

rtfc+i-i ^ 
E E " V)R{:\t)\Q(t k ) > 



(67) 



E| E (Qlu(t)-V)R^(t)\Q(t k ) ] j 
> E E ^su(t) ~ V)R(:\t)\Q(t k ) (64) 



E<| E (Qsu(t k )-V)R{f(t)\Q(t k )\ 
t=t k ) 
ftfc+i-i 

E (t-t k )llmaxR{u(t)\Q(tk) 
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Again using the fact that R f s f(t) < A max and Yt t =t k (* 

.r.i-, T[k](T[k]-t) 

t[k\) = 1 u 1 ' — '-, we get: 



(68) 



e E Wif(*)-n^ 6 (*)IO(**)[> 

I t=tfc J 
E f E - y)^(t)IQfe)} - BtSS^L 

Using (|67| | and d68l l in d64l . we have: 

E<^ E {Qsu(t k )-V)Rft(t)\Q(t k ) > 
E<^ J2 (Qsu(t k )-V)Rff(t)\Q(t k )\-C 

Appendix B 
Proof of TheoremQJ parts 2 and 3 

We prove parts (2) and (3) of Theorem Q] using the tech- 
nique of Lyapunov optimization. Using ([PEt . a bound on 
the Lyapunov drift under the Frame-Based-Drift-Plus-Penalty- 
Algorithm is given by: 

A(t fe ) - VE | £ i^ b (t)|Q(t fc ) j < £ + (Q su (t k ) - V) 
xe| E i?L Qb (0IQ(^)| -^«(^)E{T[fc]F Ql , s |Q(t fc )} 

-e( £ (Q, M (i fc )ML Qb W-^nfe)P/„ Qb W)IQ(i fc )) 



(69) 



Using Lemma |2] we have that: 

E -tOi# 6 (t)|Q(*k» < 

t=t k ) 

C + Ei E (QM- V)Rf*(t)\Q(t k )\ 
I t=tfc J 
Next, note that under the ALT algorithm, we have: 

^{Ett^iQsM - v ) R ™(t)\Q(,t k )} 

E{T[k]\Q(t k )} 
E{Ei + i r 1 W-(^) - V)R^(t)\Q(t k )} 



< 



E{f[k]\Q(t k )} 



To see this, we have two cases: 

1) Qsu{t k ) > V: Then, R%(jt) = for all t G {t k ,t k + 
1, . . . ,t k +i — 1}, so that the left hand side above is 
while the right hand side is > 0. Hence, the inequality 
follows. 

2) Qsu(t k ) < V: Then, R%(t) = A su (t) for all t G 
{t k ,t k + l,...,t k+ i — 1}, so that the left hand side 



becomes (Q su {t k ) — V)X SU while the right hand side 
cannot be smaller than (Q su {t k ) — V)\ su . 
Combining these, we get: 

(Q S u(t k )-V)El E Rlu(t)\Q{tk) \<C 
{ t=t k ) 

+ (Q su (t k ) v)E i E 1 RT(t)\Q(t k )\ E ™ IQ(tfc) > 

1 tt k J E[T[k}\Q(t k )} 

Finally, since the resource allocation part of the Frame- 
Based-Drift-Plus-Penalty-Algorithm maximizes the ratio in 
( fTBT ), we have: 

e<^ e (Qsu(t k )^ b (t)-x su (t k )Pi: b (t))\Q(t k )\> 

E {QsuW£\t) ~x su {t k )p? u at {t))\Q{t k ) 

t=t k 

E{T[k]\Q(t k )} 



.{f[k]\Q(t k )} 



Using these in d69| ), we have: 

rtfc+i-i ~> 
A(t k )-VE{ E Riu(t)\Q(t k ) \<B + C 



tfc+i-i 



(Q S u(t k )-V)E{ E Rf u at (t)\Q(t k ) 



ifc+i— 1 



E{T[k]\Q(t k )} 
E{f[k}\Q(t k )} 

\ E (Q™(tk)»T(t) - x su (t k )p s T\t))\Q(t k ) 
[ t=t k 

x E {T[k\ \Q(t k )} Xsu(tk)E {T [k]P avg \Q(t k )} 
E[T[k]\Q(t k )j 

Using (l34T>-(f36l> in the inequality above, we get: 
rtk+i-i -\ 
A(t k )-VEl E R f st(t)\Q(t k ) \<B + C 
{ *=t k ) 
-Vv*E{T[k]\Q{t k )} (70) 

To prove (ETl i. we rearrange (l70l to get: 

A(t k ) <B + C- Vv*E {T[k]\Q(t k )} 

+ VEl E R su\t)\Q(tk) \ < B + C + VT max A max 
I t=t k J 

(l4lT i now follows from Theorem 4.1 of [12T1 . Since X su {t k ) is 
mean rate stable, (l42l follows from Theorem 2.5(b) of Ell . 
To prove ( 144-b . we take expectations of both sides of (l70l to 
get: 

E{L(Q(i fe+1 ))}-E{L(Q(t fe ))}-UE E 
< B + C -Vv*E{T[k]} 
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Summing over k G {1, 2, . . . , K}, dividing by V, and 
rearranging yields: 

k=i { t=t k ) k=i 

where we used that fact that E {L(Q(tK+i))} > and 
E{L(Q(ti))} = 0. From this, we have: 

Et^{EtT lRf s^)} . (B + QK 

— > v 



Note that X is a geometric r.v. with parameter <f> nc . Thus 
E{X} = l/<j> nc and E{X 2 } = (2 - </> Ilc )/0 2 c . Also, 
EjA^} = \ pu E{X} — \ pu /(j) nc . Using these in ( |72| i, we 
have: 



,21 _ ( 2 ^ 0nc) 



2A 



/.>(/ 



ELiE{T[fe]} 



B + C 
VT 



since £f =1 E{T[£;]} > ^ mm . This proves ©. 



X>1 



Appendix C 
Computing D 

Here, we compute a finite D that satisfies (fj). First, note 
that E{T 2 [/fc]} would be maximum when the secondary user 
never cooperates. Next, let I[k] and B[k] denote the lengths 
of the primary user idle and busy periods, respectively, in the 
k th frame. Thus, we have T[k] = I[k] + B[k]. 

In the following, we drop [k] from the notation for conve- 
nience. Using the independence of / and B, we have: 

E{T 2 } =E{/ 2 } +E{B 2 } + 2E{I}E{B} 

We note that / is a geometric r.v. with parameter X pu . Thus, Using this, we have: 
E{/} = 1/Xpu and E{/ 2 } = (2 - A pu )/A 2 M . To calculate 
E {B}, we apply Little's Theorem to get: 



4> 2 ic ^nci&nc — X pu ) 

To calculate the last term, we have: 

E |(E^) 2 ) = e {e^ 2 ) + 2IE |e^ b . 

I i=l J U=l J [ ijtj 

= E{N}E{B 2 } + 2{E{B}) 2 (E{N 2 } - E {N}) 

Note that given X = x, N is a binomial r.v. with parameters 
(x, X pu ). Thus, we have: 

E{N 2 } = ^2E{N 2 \X = x}Pmb[X = x] 



^ ] ^( x Xp U ) 2 + xX pu (l — Xpu) (1 — 4> nc ) 



\x-l. 



' \ru ^ ^ x (fine (i - 4> nc ) x 

x>l 

Xp U (l - Xpu) E x 0nc(l - ^nc) 2 " -1 



2 (2-0 nc ) 
A™ -t: 



~t~ Xp U (l X pu )~ 



E ■ 



iV 

(S>)' 



E{/} 



(E{/} + E{5}) 



^E {S 2 } + 2( - 1 - ) 2 (E {N 2 } -E{N}) 



This yields E{B} = l/(0 nc - A ptl ). To calculate E{B 2 }, 
we use the observation that changing the service order of 
packets in the primary queue to preemptive LIFO does not 
change the length of the busy period B. However, with LIFO 
scheduling, B now equals the duration that the first packet 
stays in the queue. Next, suppose there are N packets that 
interrupt the service of the first packet. Let these be indexed 
as {1,2,..., N}. We can relate B to the service time X of the 
first packet and the durations for which all these other packets 
stay in the queue as follows: 



s ^nc ^p 



,J n c. -^pu 



2/2A 2 



Using this, we have: 



b 2 



X, 



E \B 2 \ = (2 ,/" c) + - - 2Xpu - - + 9^E \B 2 } 



*L(<Anc - X p u) 



/ 1 \ 2 / 

\ 4>nr. — Xmi. ' \ 



puj 

^2X 2 pu (\-<p nc ) 



J nc '^pu ' 

Simplifying this yields: 



N 



B = X 



E^ 



(71) 



E{B 2 } 



2Ad 



4A 2 (l-0 nc ) 



Here, Bi denotes the duration for which packet i stays in 
the queue. Using the memoryless property of the i.i.d. arrival 
process of the primary packets as well as the i.i.d. nature of 
the service times, it follows that all the r.v.'s Bi are i.i.d. with 
the same distribution as B. Further, they are independent of 
N. Squaring (T7TT > and taking expectations, we get: 



E{B 2 } = E{X 2 } +2E{X}E{N}E{B} 



N 



-E- 



E B < 



(72) 



_ (2 - (j) nc ) 

*Pnc \Ori<: Xp U ) ^nc{4^nc Xp U ) 2 (frncifinc Xp U )^ 
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